A comparison of neural network feature transforms for speaker diarization

نویسندگان

Sree Harsha Yella

Andreas Stolcke

چکیده

Speaker diarization finds contiguous speaker segments in an audio stream and clusters them by speaker identity, without using a-priori knowledge about the number of speakers or enrollment data. Diarization typically clusters speech segments based on short-term spectral features. In prior work, we showed that neural networks can serve as discriminative feature transformers for diarization by training them to perform same/different speaker comparisons on speech segments, yielding improved diarization accuracy when combined with standard MFCC-based models. In this work, we explore a wider range of neural network architectures for feature transformation, by adding additional layers and nonlinearities, and by varying the objective function during training. We find that the original speaker comparison network can be improved by adding a nonlinear transform layer, and that further gains are possible by training the network to perform speaker classification rather than comparison. Overall we achieve relative reductions in speaker error between 18% and 34% on a variety of test data from the AMI, ICSI, and NIST-RT corpora.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker diarization of spontaneous meeting room conversations

Speaker diarization is the task of identifying “who spoke when” in an audio stream containing multiple speakers. This is an unsupervised task as there is no a priori information about the speakers. Diagnostical studies on state-of-the-art diarization systems have isolated three main issues with the systems; overlapping speech, effects of background noise and speech/nonspeech detection errors on...

متن کامل

A Triplet Ranking-Based Neural Network for Speaker Diarization and Linking

This paper investigates a novel neural scoring method, based on conventional i-vectors, to perform speaker diarization and linking of large collections of recordings. Using triplet loss for training, the network projects i-vectors in a space that better separates speakers in terms of cosine similarity. Experiments are run on two French TV collections built from REPERE [1] and ETAPE [2] campaign...

متن کامل

Detection and Handling of Overlapping Speech for Speaker Diarization

This thesis concerns the detection of overlapping speech segments and its further application for the improvement of speaker diarization performance. We propose the use of three spatial cross-correlationbased parameters for overlap detection on distant microphone channel data. Spatial features from di↵erent microphone pairs are fused by means of principal component analysis or by an approach in...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Speaker Diarization with LSTM

For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications. However, mirroring the rise of deep learning in various domains, neural network based audio embeddings, also known as d-vectors, have consistently demonstrated superior speaker verification performance. In this paper, we build on the success of dvec...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

A comparison of neural network feature transforms for speaker diarization

نویسندگان

چکیده

منابع مشابه

Speaker diarization of spontaneous meeting room conversations

A Triplet Ranking-Based Neural Network for Speaker Diarization and Linking

Detection and Handling of Overlapping Speech for Speaker Diarization

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Speaker Diarization with LSTM

عنوان ژورنال:

اشتراک گذاری